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1 TUTORIAL GOALS 

The goal of this tutorial is to present the Empirical Bayes (EB) method for analyzing before-after 
crash data in a step-by- step format. The tutorial is designed to be used in conjunction with the 
companion Empirical Bayes Excel Spreadsheet. 

2 EMPIRICAL BA YES METHOD 

The Empirical Bayes (EB) process consists of five steps: determining (1) the safety performance 
function, SPF, (2) the overdispersion parameter, §, (3) the relative weights, a, (4) the estimated 
expected crashes, % and (5) the index of effectiveness, 9. Each of these steps is described more 
fully below. 

2.1 Determination of the Safety Performance Function, SPF 

The first step in the Empirical Bayes process is to determine a unique Safety Performance 
Function (SPF). The SPF is a mathematical model that predicts an estimate of crash occurrence 
for a given roadway segment (1). According to Hauer, crash occurrence is best modeled using a 
multivariate statistical model (2). A model is simply an equation or set of equations that link the 
expected crash frequency on the roadway to measurable roadway traits such as AADT, length of 
roadway segment, roadway width, shoulder width, number of lanes, etc. 

The SPF is determined from the data collected in the period before any treatments were made to 
the roadway segment and therefore can consider data available from previously identified "case" 
and "control" sites to increase the size of the sample and enhance the accuracy of the predictive 
model. The SPF can then be used to predict the number of crashes expected to occur each year 
at the "case" sites had there been no improvements to the roadway. Each type of roadway, 
Interstate and non-Interstate, will have different SPFs to predict the expected number of crashes. 
The multivariate statistical model used to establish the SPF can be determined using various 
statistical modeling computer software packages on the market today. LIMDEP Version 7.0 was 
used in this investigation to determine the SPFs for the two types of roadways in question. 
Based on the crash and roadway parameter data available, the SPF was modeled using a multiple 
linear regression equation that estimates the number of crashes per three years per roadway 



1 



segment that occurred in the "before" period. The multiple linear regression equation is of the 
following form: 

SPF, = P + P x x n + /3 2 x i2 + ...J3 p _,x ip _ x + e. 
where: 

SPF, denotes the dependant variable (crashes per three years for roadway segment / 

before any treatment) 

Xii through x ip _i denote the independent, explanatory variables (average annual daily 
traffic (AADT), number of lanes, lane width, shoulder width, speed limit, etc.) for 
roadway segment i 

Po through Pp.i denote estimable parameters (determined by LIMDEP), p represents the 
y-intercept value and 

s, denotes the unexplainable, random error not accounted for in the model. 

Using the crash data provided as part of this study, the unique estimated SPFs for Interstate and 
non-Interstate roadways are described below and detailed in Tables 1 and 2, respectively. In 
each case, average annual daily traffic (AADT) and segment length were found to be significant 
in affecting crash occurrence at the 95-percent confidence level (t-statistics > 1.96 confirm 
significance): 

SPF t =P +P ] (L,) + /? 2 ( AADT;) 

where: 

Li denotes the length of roadway segment i 

AADTi denotes the annual average daily traffic per three years on roadway segment i 
and all other variables are as previously defined. 

Variables such as the number of lanes, lane width, shoulder width, speed limit and others were 
largely invariable across the sample and hence, did not prove to be significant explanatory 
variables for crash occurrence. For example, lane widths were consistently 12 feet for roadway 
segments experiencing both high and low crash occurrences precluding any meaningful 
correlation. 



Further, key variables specific to reconstruction and pavement preservation treatments (as 
compared to direct safety treatments) and their resulting effects on the roadway environment 
were omitted from the available data set. For example, data describing roadway surface 
condition (i.e., ride quality, surface friction, degree of rutting or cracking, etc.) prior to and 
following treatment may have proven significant in predicting the change in crash occurrence for 
these types of treatments. 

Table 1. Model Parameters for Interstate Highways 



Variable 


Po, Pi, and p 2 


t-statistic 


Standard Error 


y-intercept 

U 

AADT, 


1.812309 
0.108752 
0.000167 


3.584 
3.435 
5.189 


0.50568 
0.03166 
0.00003 


* 


0.078141 


2.266 


0.03448 



Table 2. Model Parameters for Non-Interstate Highways 



Variable 


Po,Pi,andp 2 


t-statistic 


Standard Error 


y-intercept 

U 

AADT, 


1.207848 
0.063425 
0.000560 


4.302 
3.506 
4.861 


0.28075 
0.01809 
0.00012 


* 


0.182151 


2.54 


0.07171 



With only two significant variables and a small number of roadway segments in the sample, the 
goodness of fit for either model is low; the adjusted p 2 -value for Interstate and non-Interstate 
roadway segments are 0.240 and 0.201, respectively. A p 2 -value equal to 1.0 indicates a perfect 
model. Also included in Tables 1 and 2 are the estimated overdispersion parameters, §, for each 
roadway type which are discussed in detail in the next section. 



2.2 Determination of the Overdispersion Parameter, (/> 



To estimate a roadway's SPF, it is necessary to assume an underlying probability distribution for 
the crash frequencies. Historically, crash frequencies were often assumed to follow a Poisson 
distribution. The Poisson distribution assumes that the mean and variance observed for the crash 



frequency variable are equal. Studies have shown that the differences between the crash 
frequencies and model predictions based on a Poisson distribution are inconsistent, likely 
resulting from a violation of this equality assumption (3). Therefore, researchers more 
commonly assume a negative binomial distribution to represent the distribution of crash 
frequencies (3). One of the parameters used to confirm whether the underlying probability 
distribution is correctly identified as negative binomial is the overdispersion parameter, §. Data 
is said to be overdispersed if the variance of the dependent variable exceeds its mean (i.e., 
violating the constraints of the Poisson distribution). For both Interstate and non-Interstate 
roadways considered in this investigation, the data was confirmed to be overdispersed (the 
variance of the crash frequency variable exceeded the mean) as evidenced by a statistically 
significant (|)-value at the 95-percent confidence level. 

Proceeding with the EB method, this overall overdispersion parameter, representing all roadway 
segments in combination, is secondly used to account for varying degrees of overdispersion 
between roadway segments attributable to differences in roadway traits and crash occurrences. If 
each roadway segment were of equal length and had consistent geometric characteristics, traffic 
characteristics, etc., the overall overdispersion parameter would be directly applicable to each 
individual roadway segment. However, since the roadway segments vary in length and 
characteristics, a unique overdispersion parameter, § iy must be determined for each roadway 
segment. Segment length is assumed to be a primary determinant affecting individual 
overdispersion parameter values. Under this assumption, using the overall overdispersion 
parameter as the overdispersion of each individual segment would skew the model by placing 
more emphasis on the longer roadway segments (3). To better estimate the expected number of 
crashes for each individual roadway segment, the overdispersion parameter can be adjusted 
based on length to represent the individual segment, i: 

</>,=(/>■ Lf 

where: 

§i denotes the adjusted overdispersion parameter for roadway segment i 

§ denotes the overall overdispersion parameter for all combined roadway segments 

L, denotes the length of roadway segment i 



P is a constant between and 1 (3). 

The P-value takes into account the differences in geometric characteristics, traffic 
characteristics, etc. between the individual roadway segments; if each roadway segment Was 
completely dissimilar from other roadway segments (i.e., had no characteristic similarities to the 
other roadway segments), P = and the roadway segment in question would be represented by 
the overall overdispersion parameter. Alternately, if each of the roadway segments had exactly 
the same characteristics as all other segments of the roadway, P = 1 and the overdispersion for 
the roadway segment in question would be represented by the overall overdispersion parameter 
adjusted only by the segment length. A P-value somewhere between zero and one is most 
representative for segments defined along a continuous roadway ( 3). However, a P-value equal 
to 1 was assumed for this study to provide the most conservative estimates of future crash 
occurrences. 

An alternative method for determining the adjusted overdispersion parameter assumes a unique 
gamma distribution for each roadway segment, i (3). The individual overdispersion parameters 
using this alternate method can be calculated as follows: 

where y is a constant between and 1 and all other variables are as previously definedf 3). 

If the parameter y is set to zero, then the standard negative binomial model is obtained. If y is 
greater than zero, then the variance of the gamma distribution decreases as SPFi increases ( 3). 
This method of analysis has been used by a number of researchers to determine the individual 
overdispersion parameter for before-after studies and likely yields more accurate results, 
however, determination of the y-value requires analysis not typically employed in practice. 
Miaou and Lum modeled crash occurrence on rural interstate highways using y = 1 (4). 
Previously published y-values for similar roadway types may be transferable. Hence, this 
investigation also assumed y = 1. 

With no superior method for determining individual overdispersion parameters emerging, this 
investigation used both methods and carried the two sets of results forward throughout the 
remainder of the EB process. Tables 3, 4 and 5 and summarize the results. 



Table 3. Overdispersion Parameters and Relative Weights for Interstate Roadway Segments 



PROJECT 
I.D. 


PROJECT NAME 


§i = ^L; 


*,■ = *SPF ; 


Overdispersion 
Parameter 


Relative Overdispersion 
Weight Parameter 


Relative 
Weight 


I-15-2-(70)116 


BUXTON INTERCHANGE. - N & S 


0.38 


0.114 


0.23 


0.072 


1-15-4(75)200 


LINCOLN ROAD - SIEBEN 


1.25 


0.233 


0.32 


0.072 


1-15-6(28)323 


BRADY - NO. & SO. (NORTHBOUND) 


0.92 


0.202 


0.28 


0.072 


1-90-1(119)74 


ALBERTON - EAST & WEST 


0.83 


0.167 


0.32 


0.072 


1-90-5(53)240 


PIPESTONE EAST & WEST 


0.66 


0.147 


0.30 


0.072 


1-90-7(70)341 


MISSION INTERCHANGE - EAST 


0.88 


0.172 


0.33 


0.072 


1-90-9(81)503 


DUNMORE - SOUTH 


0.42 


0.113 


0.26 


0.072 


1-90-8(131)450 


27TH ST. - LOCKWOOD 


0.28 


0.050 


0.41 


0.072 


1-94-3(50)115 


5.3 KM WEST OF HATHAWAY - EAST 


1.04 


0.215 


0.30 


0.072 


1-94-4(56)129 


MILES CITY - EAST & WEST 


0.99 


0.205 


0.30 


0.072 


1-94-4(57)143 


BAKER INTERCHANGE - EAST 


0.39 


0.119 


0.23 


0.072 


1-94-5(27)163 


PRAIRIE COUNTY LINE - EAST 


0.50 


0.143 


0.23 


0.072 


1-94-6(45)191 


DAWSON COUNTY LINE - EAST 


1.48 


0.252 


0.34 


0.072 



Table 4. Overdispersion Parameters and Relative Weights for Highway Reconstruction Roadway 
Segments 



PROJECT I.D. 


PROJECT NAME 


$i = §U 


h = ^SPF,- 


Overdispersion 
Parameter 


Relative 
Weight 


Overdispersion 
Parameter 


Relative 
Weight 


STPP13-1(22)0 


REYNOLDS PASS - NORTH 


1.57 


0.355 


0.52 


0.154 


STPP14-2(12)33 


WHITE SULPHUR SPRINGS - SOUTH 


1.63 


0.357 


0.54 


0.154 


NH16-1(35)23 


YELLOWSTONE CO. LINE (N. - S.) 


1.16 


0.286 


0.53 


0.154 


STPP52-2(20)40 


CRESTON NORTH 


1.41 


0.273 


0.68 


0.154 


NH1-1(37)69 


HAPPY'S INN E & W 


2.11 


0.391 


0.60 


0.154 


STPP13-1(19)65 


NORRIS - HARRISON 


1.73 


0.366 


0.55 


0.154 


NH53-1(18)16 


ACTON - NORTHWEST 


2.09 


0.384 


0.61 


0.154 



Table 5. Overdispersion Parameters and Relative Weights for Highway Preservation Roadway 
Segments 



PROJECT I.D. 


PROJECT NAME 


§i = §U 


*/=*SPF ; 


Overdispersion 
Parameter 


Relative 
Weight 


Overdispersion 
Parameter 


Relative 
Weight 


NHl-8(20)72 


MALTA - SACO 


4.99 


0.576 


0.67 


0.154 


STPN5-2(81)79 


ELMO - NORTH 


2.48 


0.385 


0.72 


0.154 


NH11-1(30)14 


YANKEE JIM CANYON - NORTH 


1.93 


0.405 


0.52 


0.154 


NH11-1(31)24 


EMIGRANT NORTH - SOUTH 


1.81 


0.379 


0.54 


0.154 


STPP13-1(27)24 


McATEE (NORTH -SOUTH) 


2.12 


0.445 


0.48 


0.154 


STPN24-1(48)32 


CLEARWATER JCT. - EAST 


4.35 


0.529 


0.71 


0.154 


NH37-2(19)62 


ASHLAND - EAST 


2.59 


0.500 


0.47 


0.154 



2.3 Determination of the Relative Weight, a 

To adjust for varying degrees of overdispersion, a relative weight, a t , is applied to each roadway 
segment. The segment- specific relative weight is determined as follows: 
1 

where a, denotes the relative weight applied to roadway segment i and all other variables are as 
previously defined (I). The roadway segment relative weights for this investigation are provided 
in Tables 3 through 5. 

2.4 Determination of Estimated Expected Crashes, n 

Once the previous steps have been completed, the estimate of the expected crashes for a given 
roadway segment can be calculated using the following equation (1): 

7r i =(a l )-(SPF l ) + (l-a l )(A i ) 
where: 

Tti denotes the expected number of crashes per three years on roadway segment, i 

Xi denotes the actual number of crashes per three years on roadway segment, i 
and all other variables are as previously defined. 

2.5 Determination of the Index of Effectiveness, 6 

The last step in the EB process is to express the resulting effectiveness of any treatment (i.e., 
roadway reconstruction and pavement preservation improvements, safety improvements, etc.) as 
a relative difference in crash occurrence between actual and expected. With the expected crash 
occurrence determined in the previous step and the actual crash occurrence observed, the 
difference can be calculated directly. However, this direct calculation method does not account 
for the uncertainty resulting from (1) sampling such a small number of projects to represent the 
larger population, (2) the resulting low explanatory power (i.e., goodness of fit) of the SPF, (3) 
the assumptions supporting the determination of the overdispersion parameters and relative 
weights and (4) the overall underlying data variability project to project. Instead, an index of 



effectiveness, 9 r , that takes into account this uncertainty through the data variance observed for 
each roadway segment must be determined. The variance, a t 2 , can be calculated as follows: 

O",. 2 = (l-a i )n i 
where a t and m are as previously defined. 

The variance of the data can also be calculated using the following equation: 

SPF; 



O-; 2 = SPF, 



1 



where all variables are as previously defined. 

The index of effectiveness is a function of the previous parameters given by (8): 

where 9, denotes the index of effectiveness and all other variables are as previously defined. 

Finally, the relative difference in crash occurrence between actual and expected conditions is 
determined as (5): 

relative difference in crash occurrence = lOO(l-0 t ) 
where all variables are as previously defined and results are expressed as a percentage. 

3 USING THE EB EXCEL SPREADSHEET 

An Excel spreadsheet is provided to facilitate application of the Empirical Bayes process, 
previously outlined, for: (1) Interstate, (2) two-lane highway preservation and (3) two-lane 
highway restoration projects. The SPF and overall overdispersion parameter have already been 
established for each roadway type as part of this larger study. 

The first component of the spreadsheet provides an overview of the EB process as shown in 
Figure 1 . The second component of the spreadsheet consists of two tables that determine the 
effectiveness of roadway treatments (based on the two alternate overdispersion parameter, § i: 
calculations) using the process outlined in Section 2. The user is required to input values into the 
yellow highlighted cells in the worksheet shown in Figure 2. Once the information is included in 



the spreadsheet, the percent relative difference between actual and expected crashes will 
automatically be determined. The information required for the spreadsheet includes: 

• Project ID 

• Project Name 

• Milepost at the beginning of the project segment (MP Begin) and the Milepost at the end 
of the project segment (MP End) 

• AADT on the segment 

• Number of actual crashes on the segment 
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Figure 1. Overview of the EB Method 
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BEGIN 


MP 
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Actual 
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Results 
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Parameter, ^ 


Relative 
Weight, n. 
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Standard 

Deviation, o 


Variance, it^ 


El 


65 


1999-2001 


1-15-2-170)116 


INTERCHANGE. - N SIS 


115 900 


120 700 


4. BOO 


3456 


10 


2.9 


0.38 


0114 


9.2 


2.9 


6.142 




|| 66 1996 


2000 


-15-4(75)200 


LINCOLN ROAD - SIEBEN 


201.667 


217.712 


16.045 


3436 


61 


4.1 


1.25 


0.233 


47.8 


6.1 


36.B46 




67 1996 


2000 


-15-6(28)323 


BRADY - NO. 6, SO. (NORTHBOUND) 


321 '22 


00 - 322 


1 1 .BOO 


3276 


15 


3.B 


0.92 


0.202 


12.7 


3.2 


16.138 




68 1999 


2001 


-90-1(119)74 


ALBERTON - EAST 6, WEST 


73.895 


84498 


10.B03 


7039 


71 


4.1 


0.83 


0.1 B7 


59.9 


7.1 


49.B76 




69 1999 


200 


-90-5(53)240 


PIPESTONE EAST 8, WEST 


240.000 


248.500 


8.500 


6686 


37 


3.9 


0.66 


0.147 


32.1 


5.2 


27.402 




70 1999 


200 


-90-7(70)341 


MISSION INTERCHANGE - EAST 
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Interstate Resurfacing EB Results: 






























91 


4, = tfSPF) 
































92 


























93 

94 


TEARS DATA 
COLLECTED 


PROJECT IXJ. 


PROJECT NAME 


MP 
BEGIN 


MP 

END 


Segment 
Length, L 


AADT 


Actual 
Crashes, 


SPF 
Results 


Overdispersion 
Parameter, fc 


Relative 
Weight, a. 


Accidents, ■ 


Standard 
Deviation, a 


Variance, c. 


Ei 


F 


1999-2001 


l-15-2-(70)116 


BUXTON INTERCHANGE. - N 6, S 


115.900 


120.700 


4.B00 


3456 


10 


2.9 


0.23 


0.072 


9.5 


3.6 


8.799 




1998 


2000 


-15-4(75)200 


LINCOLN ROAD - SIEBEN 


20! 057 


217.712 


16.045 


3436 


61 


4.1 


0.32 


0.072 


56.9 


7.3 


52.756 




» 


1998 


20C0 


-15-6(28)323 


BRADY - NO. S. SO. (NORTHBOUND) 


'12172/ 


333 020 


1 1 .300 


3270 


15 


3.6 


0.28 


0072 


142 


3.6 


13.149 


— ' 


1999 


2001 


-90-1(119)74 


ALBERTON - EAST & WEST 


73895 


84498 


10.603 


7039 


71 


41 


0.32 


0072 


66.2 


7.8 


61.360 




P 


1999 


2001 


-90-5(53)240 


PIPESTONE EASTS. WEST 


240000 


248 500 


8.500 


6680 


37 


3.9 


0.30 


0072 


34.6 


5.7 


32.D90 




1999 


2001 


-90-7(70)341 


I I 'II INTERCHANGE -EAST 


341 000 


062 300 


1 1 .300 


7327 


65 


4.3 


0.33 


0072 


60.6 


7.5 


56.206 




P 


1998 


2000 


-90-9(81)503 


DUNMORE -SOUTH 


501.200 


000 034 


5434 


5623 


15 


3.3 


0.26 


0.072 


14.2 


3.6 


13.129 




199S 


2001 


1-90-8(131)456 


27TH ST - LOCKWOOD 


449.700 


453.300 


3. BOO 


18524 


137 


5.3 


041 


0.072 


127.5 


16.9 


118.217 




103 1 1999 


200' 


1-94-3(50)115 


/AY - EAST 


114.804 


128.096 


13.292 


3257 


37 


3.B 


0.30 


0.072 


34.6 


5.7 


32.087 


, 


H 1 h M \lnterstate Worksheet / Highway Reconstruction ; Hig 


/ Sumrciar 


/ Rel. 27eiohtT7 




i±r„ 








"" 


"" 


' "" >ir 










Draw- fe AutoSliapis - \ ^ D O i 4 O S H * * i£ - A. - 
Ready 


= 2^,= 


■ i. 














NOM 



Figure 2. User Inputs 



4 INTERPRETING THE RESULTS 



For a safety improvement to be noted, the number of crashes expected to occur on a roadway 
segment had no treatment been made should exceed the actual number of crashes observed on 
the roadway segment following treatment. The last two columns in the spreadsheet indicate if 
the actual number of crashes that occurred on each roadway segment is higher or lower than the 
number of crashes expected to occur without treatment. If the actual number of crashes 
occurring was lower, the results will be tabulated in the % Lower than Expected column and the 
text NA (not applicable) will appear in the % Higher than Expected column. The reverse effect 
will occur if the actual number of observed crashes exceeds the expected crash frequency. The 
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results for all roadway reconstruction projects in combination are highlighted in green in the 
spreadsheet and are located at the bottom of the % Lower than Expected and the % Higher than 
Expected columns (see Figure 3). 
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Interstate Resurfacinq EB Results: 
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Interstate Resurfacing EB Results: 
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Figure 3. Relative Difference Display 



5 LIMITATIONS OF THE EB METHOD SPREADSHEET 

The primary limitation of the EB method as applied in this spreadsheet is that the SPF was 
estimated using an aggregate three years of crash data (i.e., crash frequencies per three years per 
roadway segment). Hence, to accurately apply this SPF model, the units of crash frequencies per 
three years per roadway segment need to be maintained (i.e., annual crash data cannot be used in 
place of three-year aggregated data without re-estimating the SPF model). 
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A more minor limitation pertaining to the use of the spreadsheet occurs when new rows of 
information are added into Excel. All of the calculations for individual roadway segments will 
remain unaltered, however the final percent relative difference in crashes by roadway type could 
be miscalculated if the user does not modify the cell formula to include the additional rows. The 
user should double-check that the summations of the Actual Crashes, the Expected Crashes, and 
the Variance columns are all summed correctly, so that the final results are accurate. 
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